Enhancing the selection of a model-based clustering with external qualitative variables
نویسندگان
چکیده
In cluster analysis, it is often useful to interpret the obtained partition with respect to external qualitative variables (defining known partitions) derived from alternative information. An approach is proposed in the model-based clustering context to select a model and a number of clusters in order to get a partition which both provides a good fit with the data and is related to the external variables. This approach makes use of the integrated joint likelihood of the data, the partition derived from the mixture model and the known partitions. It is worth noticing that the external qualitative variables are only used to select a relevant mixture model. Each mixture model is fitted by the maximum likelihood methodology from the observed data. Numerical experiments illustrate the promising behaviour of the derived criterion. Key-words: Model-based Clustering, External Qualitative Variables, Model Selection, Integrated Completed Likelihood, ICL ∗ LSTA, Université Pierre et Marie Curie Paris VI † BRU-UNIDE, ISCTE-IUL ‡ INRIA Saclay Île-de-France, Projet select, Université Paris-Sud 11 § ISEL, ISCTE-Lisbon University Institute ¶ ProjAVI (MEC), BRU-UNIDE & CEAUL ha l-0 07 47 38 7, v er si on 1 31 O ct 2 01 2 Sélection d’un modèle de classification tenant compte de variables qualitatives illustratives Résumé : En classification non supervisée, il est souvent utile d’interpréter la classification à l’aide de variables qualitatives externes qui définissent elles-mêmes des partitions. Nous proposons une approche fondée sur le modèle de mélange de lois de probabilité permettant de sélectionner un modèle et le nombre de classes produisant à la fois un bon ajustement des données et possédant une liaison forte avec les variables qualitatives externes. Cette approche se fonde sur un critère approximant la vraisemblance intégrée des données complétées par les étiquettes de la partition cherchée et par celles des partitions associées aux variables externes. Il est important de souligner que les variables externes sont seulement utilisées pour sélectionner un modèle de mélange estimé par la méthode du maximum de vraisemblance. Des illustrations numériques montrent le comportement prometteur du critère proposé. Mots-clés : Modèle de mélange, variables qualitatives externes, Sélection de modèle, vraisemblance complétée intégrée, ICL ha l-0 07 47 38 7, v er si on 1 31 O ct 2 01 2 Enhancing the selection of a model-based clustering 3
منابع مشابه
Steel Consumption Forecasting Using Nonlinear Pattern Recognition Model Based on Self-Organizing Maps
Steel consumption is a critical factor affecting pricing decisions and a key element to achieve sustainable industrial development. Forecasting future trends of steel consumption based on analysis of nonlinear patterns using artificial intelligence (AI) techniques is the main purpose of this paper. Because there are several features affecting target variable which make the analysis of relations...
متن کاملChoosing the Best Hierarchical Clustering Technique Based on Principal Components Analysis for Suspended Sediment Load Estimation
1- INTRODUCTION The assessment of watershed sediment load is necessary for controling soil erosion and reducing the potential of sediment production. Different estimates of sediment amounts along with the lack of long-term measurements limits the accessibility to reliable data series of erosion rate and sediment yield. Therefore, the observed data of suspended sediment load could be used to ...
متن کاملOn Model-Based Clustering, Classification, and Discriminant Analysis
The use of mixture models for clustering and classification has burgeoned into an important subfield of multivariate analysis. These approaches have been around for a half-century or so, with significant activity in the area over the past decade. The primary focus of this paper is to review work in model-based clustering, classification, and discriminant analysis, with particular attenti...
متن کاملDetermination and prioritization of the external needs of the physical-oriented neighborhood self-help centers using AHP method
Background and objective: Earthquake emergency is one of the most important urban emergencies in Iran and around the world that needs to addressed in a wide range of various actions.in past earthquakes the official rescue and relief teams usually didn’t reach the stricken area on time and the number of fatalities increases. Neighborhood self-help center is a new physical infrastructure that has...
متن کاملTesting Several Rival Models Using the Extension of Vuong\'s Test and Quasi Clustering
The two main goals in model selection are firstly introducing an approach to test homogeneity of several rival models and secondly selecting a set of reasonable models or estimating the best rival model to the true one. In this paper we extend Vuong's method for several models to cluster them. Based on the working paper of Katayama $(2008)$, we propose an approach to test whether rival models h...
متن کامل